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1. Introduction. I would like first to thank the editors for giving me the 
opportunity to discuss the paper Erequentist coverage of adaptive Bayesian 
credible sets by B. Szabo, A. van der Vaart and Harry van Zanten. This is 
a very significant contribution to the literature on nonparametric credible 
sets. Before discussing some specific phenomena enlightened by the paper 
on frequentist coverage of adaptive credible sets, I would like to emphasize 
why I believe this to be a crucial problem, in particular, in large or infinite¬ 
dimensional models. Over the last ten years or so, there has been a growing 
literature on frequentist properties of the posterior distribution in large or 
infinite-dimensional models. The results obtained concern mainly posterior 
concentration rates, initiated by the seminal paper of Ghosal, Ghosh and 
van der Vaart [2]. These results have shown that Bayesian nonparametric 
approaches often lead to estimates having very good frequentist properties 
such as adative and minimax (or near minimax with possibly a log n penalty 
term) convergence rates under standard loss functions. However, one of the 
interesting aspects of Bayesian methods is that they provide much more than 
point estimates via the posterior distribution, in particular, various sorts of 
measures of uncertainty can be derived. An important question is then, 
how can we understand these measures of uncertainty? This is all the more 
important when the models are complex or large, as it becomes impossible 
to (1) elicit fully a subjective prior and (2) to assess perfectly the influence 
of the prior. Hence, looking at the frequentist properties of measures of 
uncertainty is a way to answer—at least partially—these questions. 

In [3] it is observed that when the posterior distribution has concentration 
rate under some loss function if, •), and if there exists an estimate, say, 
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the posterior mean, whose convergence rate is also under the loss £(■, •), 
then if r„( 7 ) is defined as the posterior 1 — 7 quantile of £(9,9), we have 

( 1 . 1 ) l--f = jp0{£{9,9)<rn{j))dTT{9), E0{\rn{j)\) = 0{en), 

where Pq and Eq denote, respectively, the probability and expectation under 
the sampling distribution associated to the parameter 9. This means that 
on average the credible region {9;£{9,9) < ^, 1 ( 7 )} has the correct frequentist 
coverage and if is the minimax rate, it has also optimal size, asymptoti¬ 
cally. The question is then, can we describe precisely the set of parameters 
9 such that 

( 1 . 2 ) Pe{^{0,9)<rn{i))>l-i 

or is at least large enough? In their paper, the authors answer a similar 
question by allowing r„( 7 ) to be inflated by a constant L, possibly large, in 
the special case of a Gaussian white noise model with an adaptive empirical 
Bayes Gaussian prior. 

2. On polished tail parameters. A key notion in this paper is that of 
polished tail parameters 9 & £ 2 , 

00 pN 

j=N j=N 

where No,Lq and p are fixed. This definition is not intrinsic to the function 
/ = which is to be estimated since / can lead to polished tail 

parameter 9 when represented in some orthonormal basis {4>j)j and non- 
polished tail parameter 7 in some other orthonormal basis. Although this 
might seem disturbing at first, I think this is inherent to the Bayesian ap¬ 
proach. Let vr be a prior on some functional set containing a collection of 
Holder balls or Sobolev balls which leads to adaptive minimax posterior 
concentration rates over this collection. Then, since there exists no adaptive 
concentration rate in L 2 and probably in many other metrics on /, the set of 
good parameters (for which credible regions have good frequentist coverage) 
is necessarily a subset of 0. The question is which subset? Bull and Nickl [1] 
(among others) construct procedures such that the subset of badly behaved 
parameters (those that either do not lead to good coverage or do not lead 
to optimal size) is rendered as small as possible. But these procedures re¬ 
quire to artihcially withdraw the badly behaved points from the confidence 
regions, which is not entirely satisfying. In their case, however, the defini¬ 
tion of the well-behaved parameters does not depend on the representation 
of the function / in some particular basis. In the paper of Szabo et al. the 
advantage is that the credible region is constructed using standard methods 
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and it has good frequentist properties for a subset of parameters. Hence, a 
natural question arises: Is it possible to construct a prior whose set of well- 
behaved parameters is similar to that of Bull and Nickl [1], without having 
to modify the credible region by taking out the badly behaved parameters? 

3. On the generalization of the results. More importantly, I think that 
even though the results are presented in a very specific context, they pave 
the way for controlling frequentist coverage of posterior credible regions. 
Indeed, consider an (inflated) credible region in the form 

= W0)<Lr„(7)}, 

where rn( 7 ) is the posterior 1 — 7 quantile of (.{0,9) and 6 is some mini¬ 
max (adaptive) estimator with rate £n{0) (the dependence on 9 is here to 
emphasize the adaptation property). Under the condition of posterior con¬ 
centration rate £n{9) at 9, 

Ee[rn{l)] < £:n; 

see [3] for details on this result. So the only thing that remains to be verified 
is that 

(3.1) lim inf inf Pg C Ca] > 1 — 7 

for some well-identified subset 0" of 0. If the posterior distribution satisfies 

(3.2) n{9-,i{9,9)>6£n{9)\X)>a + op,{l) 

uniformly on 0°, then P 0 {rn{^) > 5en{9)) = 1 -|- o(l) uniformly on 0° and 

PeiO e C^) > Pe{({9,9) < L5en{9)) + o(l), 

and we can choose L = 1/5 to ensure (3.1). So the main difficulty is to 
verify (3.2). When 0 is the posterior mean and ({■,■) is the L 2 loss, as in 
the present paper, this boils down to bound from below the trace of the 
posterior variance. This typically requires that the posterior distribution 
asymptotically lives in a space that has an effective dimension large enough 
so that the bias of 0 is of the same order as its variance. The polished tail 
condition ensures that when the prior is based on a sequence parameter 

9 ^( 2 . 

Obviously, this is a rough description of the underlying mechanisms and 
this does not take into account some more subtle aspects of the paper. 
For instance, the maximum marginal likelihood empirical Bayes approach 
is used to simplify the computations since the empirical Bayes distribution 
remains Gaussian, while leading to an adaptive posterior distribution. Prom 
the above comments it seems that hierarchical posterior distribution could 
be treated using similar ideas, though less directly. However, how influential 
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are some of the specific aspects of this empirical Bayes procedure? If instead 
of maximizing the marginal likelihood in a, the posterior had been computed 
by considering a family of priors in the form 

A?(0,Tr^-2“) 

and either consider a hierarchical procedure with a prior on r or a empirical 
Bayes procedure maximizing the marginal likelihood in r, then adaptive 
posterior concentration rates would be achieved on the range /3 G (0, a +1/2) 
if (3 represents the Sobolev smoothness of the true parameter. Similar results 
should be obtained in this case on the range /3 G (0, a + 1/2). Now, if instead 
the prior model had the form /, conditional on r is a Gaussian process with 
kernel 

and r follows a Gamma random variable as in [4] . Then the posterior has an 
adaptive concentration rate over collections of Holder balls with smoothness 
/3, with /3 G (0, +oo). How does it impact the behavior of credible regions? 

4. How honest should a confidence region be? As I said in Section 1, the 
questions answered by the authors in this paper are important questions, as 
they help to understand some subtle effects of the prior in large dimensional 
models. However, as the nonexistence of adaptive confidence regions over 
a wide collection of Sobolev or Holder classes of functions show, the full 
minimax paradigm (i.e., having a uniform lower bound on the confidence 
and an adaptive minimax upper bound on the size of the confidence region) 
has its limits. One might wonder what is the most important? Weakening 
the requirement on the confidence or on the size of the credible regions or 
considering smaller classes of functions? Somehow the adaptive Bayesian 
approach naturally adapts on the size while losing slightly on the confidence 
properties of the credible regions, as shown by (1.1). Confidence regions 
constructed in the frequentist literature are typically honest, however, their 
sizes are not uniformly optimal. Using this starting point as a construc¬ 
tion of honest with optimal size confidence regions over smaller functional 
classes requires withdrawing from these regions badly behaved functions. 
This leads to a somewhat artificial construction. It seems thus better to be 
slightly dishonest and start with confidence regions that have optimal size 
and to understand over which subclasses of functions they are honest confi¬ 
dence regions. Obviously, this construction need not be necessarily Bayesian; 
however, I believe that the Bayesian methodology naturally leads to such a 
construction. 
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